Search CORE

131 research outputs found

Accessing spoken interaction through dialogue processing [online]

Author: Ries Klaus
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2002
Field of study

Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstützen, zu ergänzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlüsselwortbasierten Indizes vor, die Suchraumeinschränkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der Größenordnung 410 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlüsselwortbasierte Ansätze. Diese thematische Segmente können durch die vorherrschende Aktivität charakterisiert werden (erzählen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese Aktivitäten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgeführt, die Detektionsraten für diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/AntwortSpielen erweitert werden (Dialogspiele). Dialogakte und spiele können eingesetzt werden um Klassifikatoren für globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen Repräsentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische Repräsentation von Aktivität bestimmt. Dialogakte könnte in diesem Szenario ebenso nützlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darüber keinen endgültigen Aufschluß geben. Die Studie konnte allerdings für detailierte Basismerkmale wie Formalität und Sprecheridentität keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance highquality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 410 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

KITopen

Unmanned Aerial Vehicle (UAV) for monitoring soil erosion in Morocco

Author: D'Oleire-Oltmanns Sebastian
Marzolff Irene
Peter Klaus Daniel
Ries Johannes B.
Publication venue
Publication date: 07/11/2012
Field of study

This article presents an environmental remote sensing application using a UAV that is specifically aimed at reducing the data gap between field scale and satellite scale in soil erosion monitoring in Morocco. A fixed-wing aircraft type Sirius I (MAVinci, Germany) equipped with a digital system camera (Panasonic) is employed. UAV surveys are conducted over different study sites with varying extents and flying heights in order to provide both very high resolution site-specific data and lower-resolution overviews, thus fully exploiting the large potential of the chosen UAV for multi-scale mapping purposes. Depending on the scale and area coverage, two different approaches for georeferencing are used, based on high-precision GCPs or the UAV’s log file with exterior orientation values respectively. The photogrammetric image processing enables the creation of Digital Terrain Models (DTMs) and ortho-image mosaics with very high resolution on a sub-decimetre level. The created data products were used for quantifying gully and badland erosion in 2D and 3D as well as for the analysis of the surrounding areas and landscape development for larger extents

Multidisciplinary Digital Publishing Institute

Hochschulschriftenserver - Universität Frankfurt am Main

What\u27s in a word: learning base units in Japanese for speech recognition

Author: Mayfield Tomokiyo Laura
Ries Klaus
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Monitoring soil erosion in the Souss basin, Morocco, with a multiscale object-based remote sensing approach using UAV and satellite data

Author: Aït Hssaïne Ali
D'Oleire-Oltmanns Sebastian
Marzolff Irene
Peter Klaus Daniel
Ries Johannes B.
Publication venue
Publication date: 02/11/2011
Field of study

This article presents a multiscale approach for detecting and monitoring soil erosion phenomena (i.e. gully erosion) in the agro-industrial area around the city of Taroudannt, Souss basin, Morocco. The study area is characterized as semi-arid with an annual average precipitation of 200 mm. Water scarcity, high population dynamics and changing land use towards huge areas of irrigation farming present numerous threats to sustainability. The agro-industry produces citrus fruits and vegetables in monocropping, mainly for the European market. Badland areas strongly affected by gully erosion border the agricultural areas as well as residential areas. To counteract the significant loss of land, land-leveling measures are attempted to create space for plantations and greenhouses. In order to develop sustainable approaches to limit gully growth the detection and monitoring of gully systems is fundamental. Specific gully sites are monitored with unmanned aerial vehicle (UAV) taking small-format aerial photographs (SFAP). This enables extremely high-resolution analysis (SFAP resolution: 2-10 cm) of the actual size of the gully channels as well as a detailed continued surveillance of their growth. Transferring the methodology on a larger scale using Quickbird satellite data (resolution: 60 cm) leads to the possibility of a large-scale analysis of the whole area around the city of Taroudannt (Area extent: ca. 350 km²). The results will then reveal possible relationships of gully growth and agro-industrial management and may even illustrate further interdependencies. The main objective is the identification of areas with high gully-erosion risk due to non-sustainable land use and the development of mitigation strategies for the study area

Crossref

Hochschulschriftenserver - Universität Frankfurt am Main

Class phrase models for language modeling

Author: Buoe Finn Dag
Ries Klaus
Waibel Alex
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Recognition of conversational telephone speech using the Janus speech engine

Author: Finke Michael
Ries Klaus
Waibel Alex
Westphal Martin
Zeppenfeld Torsten
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Advances in automatic meeting record creation and access

Author: Bett Michael
Metze Florian
Ries Klaus
Schaaf Thomas
Schultz Tanja
Soltau Hagen
Waibel Alex
Yu Hua
Zechner Klaus
Publication venue
Publication date: 16/01/2008
Field of study

KITopen

The Karlsruhe-Verbmobil speech recognition engine

Author: Finke Michael
Geutner Petra
Hild Hermann
Kemp Thomas
Ries Klaus
Westphal Martin
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato